Segmentation Analysis

Case Study: Stratton AE Banking

Published

July 20, 2025

1 Introduction

1.1 Company

Stratton AE-Banking is a newly founded online bank in the US market. The E-banking service is a joint venture of a young fintech start-up and the long-time standing New York Stratton & Fils private banking house. The joint venture was founded in 2020 and has since then enjoyed great interest by providing digital private banking services. It profits from an AI driven recommender engine that uses past investment information together with a market and finance machine learning engine to derive investment tips and portfolio suggestions for its customers. So far, the fintech startup was well able to successfully approach young investors and customers. After the joint venture with Stratton & Fils, the fintech hopes to also attract existing customers from the established bank.

However, the conservative bank management of Stratton & Fils is extremely worried about simply approaching all of its customers, as it fears that the data driven and digital customer experience of Stratton AE may disturb some of its long-standing customers and may harm the longtime established and very intimate customer relations, which are believed to be an essential success factor in the bank’s success history.

The management thus approaches you as the head of the data science team and asks you to conduct a segmentation analysis of the bank’s existing customer base and to identify suitable customer segments, which might be open to try out Stratton & Fils joint venture. As a base for your segmentation analysis, the CRM manager provides you with the following data.

1.2 Data

Code
```{r}
library(tidyverse)
library(gt)
theme_set(theme_light())
```
Table 8.1: Logi.Tude’s CRM Data
Variable Description Measurement
Age Customer Age Age in Years
Income Household Net Income Net Income in USD
HouseholdSize Number of People Living in Household Integer number
CityAreaSize City or Main Area Population Integer number
MeanCityIncome Average Income on ZIP-Code and Street Level Average Income in USD
MeanCityHousePrize Average House Prizes on ZIP-Code and Street Level from last 5 years Average Prizes in USD
MeanCityHouseholdSize Average Household Size on ZIP-Code and Street Level from last 10 years Average Number Inhabitants
MeanCitySqFtPrice Average Prizes per Square Foot on ZIP-Code and Street Level Yes/No
NumbCars Number of registered cars of customer Number of Cars
InternetTrafficVolume Volume of Internet Traffic per customer household GB
MortageVolume Mortage to be paid by Customer USD
AccountSpending Monthly average spending from bank account USD
CreditCardSpending Monthly average spending from Credit Card USD
HelpHotlineTime Number of Minutes with Banking Hotline Minutes
CustomerSince Time since opening bank account Months
GrocerySpending Average grocery related spendings from bank account USD
StockVolume Stock Investment USD
CreditVolume Credits with the bank USD
NASDAQInvest Amount of money invested in NASDAQ listed companies USD
USAXSFundInvest Amount of money invested in Stratton owned share fund for mid sized US companies USD
BranchVisits Number of recorded branch visits within the last 8 weeks Integer number
AppLogins Number of customer logins in mobile banking app within the last 8 weeks Integer number
ATMVisitis Number of times customer used an ATM service point within the last 8 weeks Integer number
TimeOnlineBanking Time logged into the Online Banking System Minutes
ServiceFees Extra Fees paid for banking services USD
SocialMediaInter Number of Finance Specific Social Media Profiles a customer follows Integer number
Bitcoins Number of Bitcoins hold by customer Number
NFT Number of NFTs bought by customer Integer number

We can now load the data in R with the read_csv command and then inspect the dataframe with the str() command.

Code
#Import Data
BankinCRMData <- read_csv("Data/StrattonAEBankingCRM.csv")
summary(BankinCRMData)
      Age            Income       HouseholdSize   CityAreaSize   
 Min.   :18.00   Min.   : 35202   Min.   :1.00   Min.   : 61613  
 1st Qu.:23.00   1st Qu.: 42803   1st Qu.:2.00   1st Qu.:121704  
 Median :30.00   Median : 71268   Median :3.00   Median :450100  
 Mean   :35.16   Mean   : 84700   Mean   :2.81   Mean   :372196  
 3rd Qu.:45.00   3rd Qu.:125870   3rd Qu.:4.00   3rd Qu.:459418  
 Max.   :74.00   Max.   :181863   Max.   :8.00   Max.   :708729  
 MeanCityIncome   MeanCityHousePrize MeanCityHouseHoldSize MeanCitySqFtPrice
 Min.   : 35372   Min.   : 125011    Min.   :1.000         Min.   :1871     
 1st Qu.:116253   1st Qu.: 444817    1st Qu.:2.000         1st Qu.:2627     
 Median :140458   Median : 614601    Median :3.000         Median :5778     
 Mean   :163706   Mean   : 942505    Mean   :3.023         Mean   :5318     
 3rd Qu.:235000   3rd Qu.:1849915    3rd Qu.:4.000         3rd Qu.:6741     
 Max.   :286996   Max.   :1850000    Max.   :8.000         Max.   :9886     
   NumberCars    InternetTrafficVolume MortageVolume    AccountSpending 
 Min.   :0.000   Min.   :  6.00        Min.   : 14898   Min.   : 500.0  
 1st Qu.:1.000   1st Qu.: 45.00        1st Qu.:120462   1st Qu.: 560.1  
 Median :1.000   Median : 60.00        Median :232414   Median : 898.0  
 Mean   :1.384   Mean   : 67.57        Mean   :202824   Mean   :1275.7  
 3rd Qu.:2.000   3rd Qu.: 86.00        3rd Qu.:287298   3rd Qu.:1647.6  
 Max.   :4.000   Max.   :118.00        Max.   :605846   Max.   :4257.1  
 CreditCardSpending HelpHotlineTime     CustomerSince   GrocerySpending 
 Min.   : 501.1     Min.   : 0.006058   Min.   : 0.00   Min.   : 150.1  
 1st Qu.: 651.1     1st Qu.: 4.577513   1st Qu.: 3.00   1st Qu.: 293.4  
 Median : 785.7     Median : 8.774480   Median :11.00   Median : 426.5  
 Mean   :1013.3     Mean   :12.816409   Mean   :19.25   Mean   : 535.8  
 3rd Qu.:1451.4     3rd Qu.:16.593896   3rd Qu.:36.00   3rd Qu.: 627.9  
 Max.   :2041.9     Max.   :60.754994   Max.   :74.00   Max.   :1253.5  
  StockVolume    CreditVolume     NASDAQInvest    USAXSFundInvest  
 Min.   : 388   Min.   : 117.3   Min.   : 228.4   Min.   :  69.95  
 1st Qu.:1059   1st Qu.: 161.7   1st Qu.: 401.4   1st Qu.: 149.80  
 Median :1537   Median : 802.7   Median :1498.1   Median : 313.82  
 Mean   :2142   Mean   :1330.1   Mean   :1828.5   Mean   : 761.65  
 3rd Qu.:2505   3rd Qu.:2488.0   3rd Qu.:3056.0   3rd Qu.:1060.23  
 Max.   :5738   Max.   :3532.3   Max.   :4532.4   Max.   :3396.61  
  BranchVisits      AppLogins       ATMVisits      TimeOnlineBanking
 Min.   : 0.000   Min.   :  1.0   Min.   : 0.000   Min.   : 22.77   
 1st Qu.: 2.000   1st Qu.: 18.0   1st Qu.: 3.000   1st Qu.: 69.28   
 Median : 3.000   Median : 64.0   Median : 5.000   Median : 88.26   
 Mean   : 3.913   Mean   : 55.7   Mean   : 4.928   Mean   :113.88   
 3rd Qu.: 5.000   3rd Qu.: 82.0   3rd Qu.: 7.000   3rd Qu.:152.92   
 Max.   :20.000   Max.   :130.0   Max.   :11.000   Max.   :232.21   
  ServiceFees       SocialMediaInter    Bitcoins           NFTs       
 Min.   :  0.1343   Min.   : 0.00    Min.   :0.0000   Min.   : 0.000  
 1st Qu.: 17.8442   1st Qu.: 5.00    1st Qu.:0.0005   1st Qu.: 1.000  
 Median : 27.2386   Median :16.00    Median :0.0998   Median : 3.000  
 Mean   : 40.9382   Mean   :19.03    Mean   :0.1937   Mean   : 3.317  
 3rd Qu.: 50.1652   3rd Qu.:31.00    3rd Qu.:0.4005   3rd Qu.: 4.000  
 Max.   :124.2613   Max.   :60.00    Max.   :0.6014   Max.   :12.000  
Code
str(BankinCRMData)
spc_tbl_ [10,750 × 28] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Age                  : num [1:10750] 40 37 71 53 40 32 72 29 55 38 ...
 $ Income               : num [1:10750] 79623 71616 78524 69938 74244 ...
 $ HouseholdSize        : num [1:10750] 2 5 1 3 1 7 5 3 2 3 ...
 $ CityAreaSize         : num [1:10750] 454686 452465 456594 456594 452004 ...
 $ MeanCityIncome       : num [1:10750] 90668 156742 52484 118422 36227 ...
 $ MeanCityHousePrize   : num [1:10750] 1849978 1849599 1849953 1849302 1849247 ...
 $ MeanCityHouseHoldSize: num [1:10750] 3 5 3 4 6 3 2 6 3 2 ...
 $ MeanCitySqFtPrice    : num [1:10750] 3813 5264 3405 2141 2160 ...
 $ NumberCars           : num [1:10750] 2 2 1 0 1 1 2 3 3 0 ...
 $ InternetTrafficVolume: num [1:10750] 58 36 57 71 39 52 62 41 65 67 ...
 $ MortageVolume        : num [1:10750] 430299 378228 282232 394235 350471 ...
 $ AccountSpending      : num [1:10750] 938 1128 931 1002 1171 ...
 $ CreditCardSpending   : num [1:10750] 1418 694 1281 1135 1109 ...
 $ HelpHotlineTime      : num [1:10750] 3.22 3.44 2.47 5.9 5.1 ...
 $ CustomerSince        : num [1:10750] 36 36 36 36 37 36 37 36 36 36 ...
 $ GrocerySpending      : num [1:10750] 433 594 574 561 309 ...
 $ StockVolume          : num [1:10750] 1118 1392 1117 1354 1037 ...
 $ CreditVolume         : num [1:10750] 809 803 791 780 804 ...
 $ NASDAQInvest         : num [1:10750] 1488 1504 1500 1496 1500 ...
 $ USAXSFundInvest      : num [1:10750] 476 490 465 488 450 ...
 $ BranchVisits         : num [1:10750] 3 4 3 4 3 4 3 3 4 4 ...
 $ AppLogins            : num [1:10750] 10 19 14 14 16 13 11 16 20 8 ...
 $ ATMVisits            : num [1:10750] 9 8 8 8 9 8 9 7 8 7 ...
 $ TimeOnlineBanking    : num [1:10750] 71.5 67.1 58.9 61 67.2 ...
 $ ServiceFees          : num [1:10750] 41.8 52.2 54.6 41.8 57.1 ...
 $ SocialMediaInter     : num [1:10750] 27 25 28 34 34 24 21 32 28 24 ...
 $ Bitcoins             : num [1:10750] 0.0032 0.0037 0.0136 0.0016 0.0075 0.003 0.0076 0.0055 0.0036 0.0026 ...
 $ NFTs                 : num [1:10750] 2 1 1 1 0 3 2 4 3 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   Age = col_double(),
  ..   Income = col_double(),
  ..   HouseholdSize = col_double(),
  ..   CityAreaSize = col_double(),
  ..   MeanCityIncome = col_double(),
  ..   MeanCityHousePrize = col_double(),
  ..   MeanCityHouseHoldSize = col_double(),
  ..   MeanCitySqFtPrice = col_double(),
  ..   NumberCars = col_double(),
  ..   InternetTrafficVolume = col_double(),
  ..   MortageVolume = col_double(),
  ..   AccountSpending = col_double(),
  ..   CreditCardSpending = col_double(),
  ..   HelpHotlineTime = col_double(),
  ..   CustomerSince = col_double(),
  ..   GrocerySpending = col_double(),
  ..   StockVolume = col_double(),
  ..   CreditVolume = col_double(),
  ..   NASDAQInvest = col_double(),
  ..   USAXSFundInvest = col_double(),
  ..   BranchVisits = col_double(),
  ..   AppLogins = col_double(),
  ..   ATMVisits = col_double(),
  ..   TimeOnlineBanking = col_double(),
  ..   ServiceFees = col_double(),
  ..   SocialMediaInter = col_double(),
  ..   Bitcoins = col_double(),
  ..   NFTs = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
Code
skimr::skim(BankinCRMData)
Data summary
Name BankinCRMData
Number of rows 10750
Number of columns 28
_______________________
Column type frequency:
numeric 28
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Age 0 1 35.16 13.54 18.00 23.00 30.00 45.00 74.00 ▇▃▂▃▁
Income 0 1 84699.60 49337.11 35202.00 42803.25 71268.00 125870.00 181863.00 ▇▇▁▂▃
HouseholdSize 0 1 2.81 1.33 1.00 2.00 3.00 4.00 8.00 ▇▅▃▁▁
CityAreaSize 0 1 372196.13 228010.19 61613.00 121703.75 450100.00 459418.00 708729.00 ▇▂▁▇▅
MeanCityIncome 0 1 163706.02 64581.64 35372.00 116252.75 140457.50 235000.00 286996.00 ▂▅▇▃▆
MeanCityHousePrize 0 1 942505.00 713686.15 125011.00 444816.50 614600.50 1849915.00 1850000.00 ▇▆▁▁▇
MeanCityHouseHoldSize 0 1 3.02 1.34 1.00 2.00 3.00 4.00 8.00 ▇▅▅▁▁
MeanCitySqFtPrice 0 1 5318.27 2394.52 1871.00 2627.00 5778.50 6741.00 9886.00 ▇▃▇▂▅
NumberCars 0 1 1.38 0.93 0.00 1.00 1.00 2.00 4.00 ▂▇▃▂▁
InternetTrafficVolume 0 1 67.57 33.10 6.00 45.00 60.00 86.00 118.00 ▅▆▆▇▇
MortageVolume 0 1 202823.81 138105.98 14898.00 120461.75 232413.50 287297.50 605846.00 ▇▇▅▃▁
AccountSpending 0 1 1275.73 978.17 500.00 560.14 897.96 1647.60 4257.14 ▇▂▁▁▁
CreditCardSpending 0 1 1013.33 469.02 501.11 651.08 785.67 1451.37 2041.85 ▇▂▁▂▂
HelpHotlineTime 0 1 12.82 11.12 0.01 4.58 8.77 16.59 60.75 ▇▂▂▁▁
CustomerSince 0 1 19.25 21.68 0.00 3.00 11.00 36.00 74.00 ▇▁▂▁▂
GrocerySpending 0 1 535.81 300.88 150.10 293.43 426.48 627.91 1253.50 ▇▅▅▁▃
StockVolume 0 1 2142.25 1496.22 387.97 1058.73 1536.71 2505.16 5738.39 ▇▅▂▁▂
CreditVolume 0 1 1330.12 1196.72 117.30 161.72 802.75 2487.98 3532.34 ▇▂▂▃▂
NASDAQInvest 0 1 1828.48 1447.34 228.35 401.40 1498.10 3056.05 4532.36 ▇▆▁▅▂
USAXSFundInvest 0 1 761.65 836.46 69.95 149.80 313.82 1060.23 3396.61 ▇▂▂▁▁
BranchVisits 0 1 3.91 3.22 0.00 2.00 3.00 5.00 20.00 ▇▂▁▁▁
AppLogins 0 1 55.70 34.64 1.00 18.00 64.00 82.00 130.00 ▇▃▇▃▃
ATMVisits 0 1 4.93 2.32 0.00 3.00 5.00 7.00 11.00 ▅▇▅▇▁
TimeOnlineBanking 0 1 113.88 59.18 22.77 69.28 88.26 152.92 232.21 ▅▇▅▅▃
ServiceFees 0 1 40.94 31.09 0.13 17.84 27.24 50.17 124.26 ▇▆▂▁▂
SocialMediaInter 0 1 19.03 17.05 0.00 5.00 16.00 31.00 60.00 ▇▅▃▁▂
Bitcoins 0 1 0.19 0.22 0.00 0.00 0.10 0.40 0.60 ▇▁▁▂▂
NFTs 0 1 3.32 2.69 0.00 1.00 3.00 4.00 12.00 ▇▆▂▂▁

2 Distance as a measure of similarity

To identify segments of similar customers, let us first focus on the question about how to measure similarity. Table 2 shows us some observations for customers from another banking database. The columns show the values of some customer-related attributes. We can use the individual attribute characteristics to now calculate a so called distance measure, which shows how similar or dissimilar customers are. The higher the distance, the more dissimilar they are. For continuous variables, we can use the basic Euclidean Distance measure to derive similarities. The Euclidean Distance between two customers A and B can be expressed by the following equation.

\[ED_{A,B}= \sqrt{(f_{1,A}-f_{1,B})^{2}+(f_{2,A}-f_{2,B})^{2}+...+(f_{n,A}-f_{n,B})^{2}} \]

2.1 Manual Calculation

Code
Table2_1

We can now use the formula of the Euclidean Distance to calculate, for example, the distance between Hawkeye and Potter.

Code
ED_Hawkeye_Potter = sqrt((32-64)^2 + (45-75)^2+ (25-10)^2 + (1-3)^2) 
ED_Hawkeye_Potter
[1] 46.40043
Question 1

Repeat the calculations for Hawkeye and Burns as well as Hawkeye and Hotlips. What can you tell about the distances between the persons?

2.2 Using distance function

While this is a great exercise, it will be impossible to calculate the distances amongst all members of a large customer data base with e.g. 200,000 entries. However in this case we can also use R’s function for Euclidean Distances. We simply need to give the function a data frame with all observations we would like to compare, and R will return a table with the corresponding distances.

Code
library(philentropy)
distance(Table2_1[,2:5], method = "euclidean")
          v1       v2        v3        v4        v5
v1     0.000 33541.03  5830.978 32526.912 34669.872
v2 33541.035     0.00 34481.883 53600.382 59135.448
v3  5830.978 34481.88     0.000 26907.253 29529.653
v4 32526.912 53600.38 26907.253     0.000  7211.104
v5 34669.872 59135.45 29529.653  7211.104     0.000

3 k-mean as a solution to form homogenous subgroups

3.1 Scaling up distances

While the distances help us with understanding similarities and dissimilarities they do not yet help us with forming subgroups, ad only from the distances, you do not know which threshold determines similarity/dissimilarity. Hawkeye may be closest to Burns, but is 18 still a great distance? Or actually already pretty similar? Who should be paired with whom?

This implies that grouping consumers in homogenous subgroups requires a lot of attention and balance and some more information than just similarity measures. In addition, we realized with our 5 customers, that grouping takes us some time and effort and will certainly prevent us from forming larger groups or segmenting larger data sets with hundreds of thousands of customers. Therefore, it is time to discover a method that uses intersubject distances to automatically form groups. Such methods are commonly referred to as cluster analysis. Cluster analysis are a well-known and established statistical method, that is used for the last 30-40 years in marketing research. With the advent of machine learning and artificial intelligence applications, cluster analysis became again popular in data science, where it is often referred to as an unsupervised learning algorithm.

3.2 Mechanics

3.2.1 Josh Starmer’s Video

Watch this video about how K-means clustering works.


3.2.2 K-Means Clustering Process

Note
  1. Step 1: k-mean cluster analysis uses distances to form clusters within data. Once the user determined the k number of clusters the algorithm should define, the cluster randomly assigns k starting points within the data (Step1).
  2. Step 2: It continues then to calculate the distance of each observation in the data to each starting point. As pointed out in the figure below, the algorithm then assigns each observation according to the distance to the closest starting point (Step 2).
  3. Step 3: This leads to an initial cluster solution. For each of these clusters, the algorithm then calculates the new center point of the cluster, called the centroid (Step3). The centroid can be interpreted as the mean of all observations within this cluster.
  4. Step 4: now repeats the procedure of step 2. The new centroids are used to again calculate all distances between all observations and all centroids. Then again, the observations are assigned to the closest centroid. This may lead to changes in cluster membership and lead to new forms of clusters.
  5. Step 5: In the subsequent step, the algorithm continues to calculate the resulting new centroids (Step5), to then re-calculate the distances and re-assigning observations to clusters.

The algorithm stops once no observation can be re-assigned to another cluster or after a specified number of iterations.


4 Step 1: Standardizing Scale Levels

One thing we may mind before running a cluster analysis is scale heterogeneity. Especially, k-means clustering is sensitive to data that comes at different scale levels. Having variables at very different levels, thus, creates problems, which may ultimately lead to biased results. A quick fix is to standardize the variables so that they all share a similar range. This procedure is commonly referred to as standardization.

R can standardize all variables for us with the help of the scale() function. When we now inspect the resulting new data frame scaled.crm with the head() function.

Code
scaled.crm = scale(BankinCRMData, center = TRUE, scale = TRUE) # Z score
head(scaled.crm)
            Age     Income HouseholdSize CityAreaSize MeanCityIncome
[1,]  0.3576515 -0.1028962    -0.6110203    0.3617815     -1.1309409
[2,]  0.1361200 -0.2651878     1.6529279    0.3520407     -0.1078328
[3,]  2.6468104 -0.1251715    -1.3656697    0.3701496     -1.7221925
[4,]  1.3176214 -0.2991988     0.1436291    0.3701496     -0.7011903
[5,]  0.3576515 -0.2119217    -1.3656697    0.3500189     -1.9739204
[6,] -0.2330992 -0.1816402     3.1622266    0.3681716     -1.1212168
     MeanCityHousePrize MeanCityHouseHoldSize MeanCitySqFtPrice NumberCars
[1,]           1.271530           -0.01716176       -0.62863020  0.6657393
[2,]           1.270998            1.47667561       -0.02266382  0.6657393
[3,]           1.271494           -0.01716176       -0.79901909 -0.4146800
[4,]           1.270582            0.72975692       -1.32689056 -1.4950993
[5,]           1.270505            2.22359429       -1.31895578 -0.4146800
[6,]           1.271217           -0.01716176       -0.26404808 -0.4146800
     InternetTrafficVolume MortageVolume AccountSpending CreditCardSpending
[1,]            -0.2889997     1.6471061      -0.3450280          0.8636732
[2,]            -0.9537126     1.2700695      -0.1509407         -0.6818782
[3,]            -0.3192140     0.5749801      -0.3528008          0.5714726
[4,]             0.1037852     1.3859733      -0.2801354          0.2591261
[5,]            -0.8630700     1.0690862      -0.1072321          0.2034089
[6,]            -0.4702851     0.9330313      -0.3513856         -0.1526760
     HelpHotlineTime CustomerSince GrocerySpending StockVolume CreditVolume
[1,]      -0.8632445     0.7723802     -0.34104672  -0.6842514   -0.4354132
[2,]      -0.8427150     0.7723802      0.19341832  -0.5016177   -0.4407973
[3,]      -0.9302276     0.7723802      0.12646708  -0.6849314   -0.4507881
[4,]      -0.6214784     0.7723802      0.08311691  -0.5265523   -0.4594416
[5,]      -0.6937764     0.8184972     -0.75375557  -0.7384732   -0.4396339
[6,]      -0.5917007     0.7723802      0.09847517  -0.3919688   -0.4472727
     NASDAQInvest USAXSFundInvest BranchVisits AppLogins ATMVisits
[1,]   -0.2353360      -0.3417909  -0.28345927 -1.319352  1.751956
[2,]   -0.2240967      -0.3244452   0.02716123 -1.059550  1.321701
[3,]   -0.2269506      -0.3549125  -0.28345927 -1.203885  1.321701
[4,]   -0.2294082      -0.3270950   0.02716123 -1.203885  1.321701
[5,]   -0.2268875      -0.3721992  -0.28345927 -1.146151  1.751956
[6,]   -0.2259212      -0.2313141   0.02716123 -1.232751  1.321701
     TimeOnlineBanking ServiceFees SocialMediaInter   Bitcoins       NFTs
[1,]        -0.7152344  0.02878044        0.4675133 -0.8542812 -0.4895549
[2,]        -0.7897758  0.36121889        0.3502121 -0.8520388 -0.8611631
[3,]        -0.9289516  0.44046295        0.5261639 -0.8076386 -0.8611631
[4,]        -0.8941460  0.02794600        0.8780673 -0.8614570 -0.8611631
[5,]        -0.7894912  0.52092062        0.8780673 -0.8349963 -1.2327713
[6,]        -0.9766564  0.51830595        0.2915616 -0.8551782 -0.1179467

As you see, all variables now range in similar areas. We can thus proceed with our analysis.

4.1 Arbitrary K-means Solution with k=2

We can now start with the cluster analysis. Let us first try out different solutions with different numbers of clusters. To ensure that we start with the same centroids, we use the set.seed function. This ensures that every time we run this code, we obtain the same results. If you do not use set.seed ahead of the cluster analysis, you will receive different solutions, which will be close to each other but not identical. We can run a k-means cluster analysis with R’s kmeans function. We tell the kmeans function which data frame contains our customer data and specify the number of clusters we want to be included. Here we set k to 4.

  • The algorithm of Hartigan and Wong (1979) is used by default.

  • Note that some authors use k-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart>1) is often recommended. In rare cases, when some of the points (rows of x) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ifault = 4). Slight rounding of the data may be advisable in that case.

Code
```{r}
set.seed(123)
StrattonCluster_4k <- kmeans(scaled.crm, 4)
StrattonCluster_4k
StrattonCluster_4k[["size"]]
sizes4k <- data.frame(Size = StrattonCluster_4k[["size"]], 
                      Cluster = c("Cluster1", "Cluster2", "Cluster3", "Cluster4"))
```
K-means clustering with 4 clusters of sizes 1000, 1250, 2996, 5504

Cluster means:
         Age     Income HouseholdSize CityAreaSize MeanCityIncome
1  0.4957394 -0.3386598    0.47642946    0.3414837     -0.8350901
2  0.1625265 -0.1976437   -0.52529213    0.3582921      1.1039357
3  0.6297842  1.4843069    0.12524143   -0.9306195     -0.7429876
4 -0.4697913 -0.7015387   -0.03543561    0.3631517      0.3054436
  MeanCityHousePrize MeanCityHouseHoldSize MeanCitySqFtPrice  NumberCars
1         -1.1362805           0.837313214       -1.26447602 -0.03113115
2         -0.6899384          -0.750337139        0.36552584 -1.10441967
3         -0.5157368          -0.006690937        0.10248511  0.81828180
4          0.6438683           0.021921195        0.09093811 -0.18893832
  InternetTrafficVolume MortageVolume AccountSpending CreditCardSpending
1             1.4329995    -0.5650106      0.02116941        -0.56003954
2             1.4337186     0.3434931     -0.48984671        -0.98809483
3            -0.9842698     0.5143171      1.33048187         0.73046345
4            -0.0501954    -0.2553143     -0.61682135        -0.07145901
  HelpHotlineTime CustomerSince GrocerySpending StockVolume CreditVolume
1      -0.4072184    -0.3120147       0.3798751   0.9081969   0.14255359
2      -0.2284210    -0.7718011      -0.3785228   2.2435110   1.81347551
3       1.1635969     1.0279632       1.2118918   0.1702179  -0.90296052
4      -0.5075203    -0.3275821      -0.6427233  -0.7671800   0.05375577
  NASDAQInvest USAXSFundInvest BranchVisits  AppLogins  ATMVisits
1    1.1548262       2.5528726   -0.8593497  0.9937584  0.9357632
2    1.8457024       0.3440906   -0.7150975  1.7089551 -0.7479942
3    0.2932005      -0.6714841    1.1815166 -0.3830160 -0.9267941
4   -0.7885870      -0.1764570   -0.3246007 -0.3601810  0.5043431
  TimeOnlineBanking ServiceFees SocialMediaInter    Bitcoins       NFTs
1         1.0350606   1.1048109        0.9562486  1.82244896  1.1722769
2         1.8769642   2.2202269        2.1344096  1.37359382  2.1134861
3        -0.4039396  -0.3502199       -0.8524771  0.02738211 -0.4874463
4        -0.3944518  -0.5143233       -0.1944475 -0.65797203 -0.4276427

Clustering vector:
    [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
   [37] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
   [73] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [109] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [145] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [181] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [217] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [253] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [289] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [325] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [361] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [397] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [433] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [469] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [505] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [541] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [577] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [613] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [649] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [685] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [721] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [757] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [793] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [829] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [865] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [901] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [937] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [973] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1009] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1045] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1081] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1117] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1153] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1189] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1225] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1261] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1297] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1333] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1369] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1405] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1441] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [1477] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1
 [1513] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1549] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1585] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1621] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1657] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1693] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1729] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1765] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1801] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1837] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1873] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1909] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1945] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [1981] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2017] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2053] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2089] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2125] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2161] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2197] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2233] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2269] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2305] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2341] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2377] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2413] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2449] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [2485] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2521] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2557] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2593] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2629] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2665] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2701] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2737] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2773] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2809] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2845] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2881] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2917] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2953] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [2989] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3025] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3061] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3097] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3169] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3205] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3241] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3277] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3313] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3349] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3385] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3421] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3457] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3493] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3529] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3565] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3601] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3637] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3673] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3709] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [3745] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3781] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3817] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3853] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3889] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3925] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3961] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [3997] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4033] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4069] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4105] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4177] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4213] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4249] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4285] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4321] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4357] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4393] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4429] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4465] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4501] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4537] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4573] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4609] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4645] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4681] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4717] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4753] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4789] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4825] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4861] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4897] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4933] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [4969] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5005] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5041] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5077] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5113] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5149] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5185] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [5221] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
 [5257] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5293] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5329] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5365] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5401] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5437] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5473] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5509] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5545] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5581] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5617] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5653] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5689] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5725] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5761] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5797] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5833] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5869] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5905] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5941] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [5977] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6013] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6049] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6085] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6121] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6157] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6193] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6229] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6265] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6301] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6337] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6373] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6409] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6445] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6481] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6517] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6553] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6589] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6625] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6661] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6697] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6733] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6769] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6805] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6841] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6877] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6913] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6949] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [6985] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7021] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7057] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7093] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7129] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7165] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7201] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7237] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7273] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7309] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7345] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7381] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7417] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7453] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7489] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7525] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7561] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7597] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7633] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7669] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7705] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7741] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7777] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7813] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7849] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7885] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7921] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7957] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [7993] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8029] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8065] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8101] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8137] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8173] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8209] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8245] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8281] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8317] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8353] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8389] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8425] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8461] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8497] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8533] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8569] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8605] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8641] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8677] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8713] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8749] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8785] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8821] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8857] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8893] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8929] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [8965] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9001] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9037] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9073] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9109] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9145] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9181] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [9217] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3
 [9253] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9289] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9325] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9361] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9397] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3
 [9433] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3
 [9469] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9505] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9541] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9577] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9613] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9649] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9685] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9721] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9757] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9793] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9829] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9865] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9901] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9937] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [9973] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10009] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10045] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10081] 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10117] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10153] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10189] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10225] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10261] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10297] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10333] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10369] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10405] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10441] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10477] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10513] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10549] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10585] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10621] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3 3
[10657] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10693] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10729] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Within cluster sum of squares by cluster:
[1]  3850.269  1289.676 50440.263 81630.942
 (between_SS / total_SS =  54.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
[1] 1000 1250 2996 5504

4.1.1 Cluster Size

To visualize the number of customers assigned to each cluster, we also plot the cluster sizes using ggplot and a simple bar chart.

Code
sizes4k |> 
ggplot(aes(x=factor(Cluster), y=Size)) + 
  geom_col(fill=hcl(195, 100, 65)) +
  geom_text(aes(label=Size), vjust=0) + 
  xlab("Cluster") +
  ylab("Size") + 
  ggtitle("Cluster sizes for k-means 4-cluster solution")

4.1.2 Means Table by 4 Clusters

We can now inspect the different clusters and check their mean values. We achieve this with the following code, which first matches the estimated cluster to each observation in our data frame. Subsequently, we use dyplr’s group_by command to calculate the mean of each variable per cluster. You can then inspect the resulting data frame. You will notice that some clusters exhibit substantially different mean values for specific variables, whereas in other cases, the means do not vary across the clusters.

Code
#Build Cluster Specific Means for all Variables
BankinCRMData$k4Cluster = StrattonCluster_4k[["cluster"]]

BankinCRMData
Code
BankinCRMData.means.percluster_4k = BankinCRMData %>% 
  group_by(k4Cluster) %>% 
  summarise_if(is.numeric, mean, na.rm = TRUE) 

BankinCRMData.means.percluster_4k |> 
  t() |> 
  round(0)
                        [,1]   [,2]   [,3]    [,4]
k4Cluster                  1      2      3       4
Age                       42     37     44      29
Income                 67991  74948 157931   50088
HouseholdSize              3      2      3       3
CityAreaSize          450058 453890 160005  454998
MeanCityIncome        109775 235000 115723  183432
MeanCityHousePrize    131557 450106 574431 1402025
MeanCityHouseHoldSize      4      2      3       3
MeanCitySqFtPrice       2290   6194   5564    5536
NumberCars                 1      0      2       1
InternetTrafficVolume    115    115     35      66
MortageVolume         124792 250262 273854  167563
AccountSpending         1296    797   2577     672
CreditCardSpending       751    550   1356     980
HelpHotlineTime            8     10     26       7
CustomerSince             12      3     42      12
GrocerySpending          650    422    900     342
StockVolume             3501   5499   2397     994
CreditVolume            1501   3500    250    1394
NASDAQInvest            3500   4500   2253     687
USAXSFundInvest         2897   1049    200     614
BranchVisits               1      2      8       3
AppLogins                 90    115     42      43
ATMVisits                  7      3      3       6
TimeOnlineBanking        175    225     90      91
ServiceFees               75    110     30      25
SocialMediaInter          35     55      4      16
Bitcoins                   1      0      0       0
NFTs                       6      9      2       2
Task 1

Transpose the data and reformat it to increase readability. Then, assess the quality of the 4-cluster solution by comparing all means across the 4 clusters.

4.1.3 Segments Plot in two dimensional space

Another approach to assess the quality of our segmentation is to plot the different clusters. A key challenge here is dimensionality. Given that our clusters depend on a multitude of variables, we cannot plot them all together. To arrive at a solution that we can plot, we need to reduce the dimensions to two main factors, which then allows us to plot the points in a two-dimensional space. A common technique for achieving this is principal component analysis (PCA), which reduces all variables to two main factors that can be subsequently plotted. The plot will then enable us to assess better whether clusters overlap or if we achieve a meaningful separation between the different identified clusters. R’s factoextra package offers various functions that achieve this with a single command, eliminating the need to code the PCA or plot.

Code
#Plot Clusters for 4k solution
library(factoextra)

fviz_cluster(StrattonCluster_4k, BankinCRMData, ellipse.type = "norm") # object, original data

Code
fviz_cluster(StrattonCluster_4k, scaled.crm, ellipse.type = "norm") # object, scaled data; no difference observed. 

A quick inspection of the plot already reveals that our 4k cluster approach is not optimal, as we see some more separable groups of close-together observations. Especially in the case of Clusters 3 and 4 (the larger ones), it appears that we can still split these groups into two additional subgroups each.

Question 2

Repeat the cluster analysis with k = 5 and k = 6. What can you tell about the results

5 Step 2: Determine K

Trying out different solutions may point you in the right direction, but you will soon realize that determining the optimal number of clusters can be challenging.

To find the “best” number of cluster, there are different approaches and measures available. Before we discuss these, let’s first reflect on what we want to achieve with a cluster analysis.

We want to obtain subgroups that are homogeneous within. So, to say we try to maximize within-group homogeneity means we try to reduce the level of variance between members of a cluster. The overall level of within-cluster variances across all identified clusters can thus be used to describe the total degree of homogeneity obtained with a specific cluster solution. This gives us the opportunity to compare different cluster analyses with varying numbers of clusters, allowing us to minimize the overall variance.

Using the within-cluster-variance values, we can determine which solution works best and then focus on this cluster analysis. To do so, we first estimate n cluster solutions with cluster numbers from 1 to k. Subsequently, we can then plot the within-cluster variance sums for each cluster solution.

Again, R can do this for us with some short lines of code. Below you find two measures for within-cluster variances. We can now ask R to estimate K-means models with k values from 2 to 15 and then plot the within variances of each solution. Don’t worry, if this takes some time.

5.1 Elbow Method

Code
#Obtain Elbow Plots to determine optimal k 

factoextra::fviz_nbclust(na.omit(scaled.crm), kmeans, method = "wss", k.max = 15) #wss: within cluster sums of squares,

The Elbow plot (1st plot), shows the total within sum of cluster variances for all estimated 15 solutions. The rule of thumb states that the optimal cluster number lays within the “elbow” of the plot. This seems to be rather tricky as the function drops immediately and shows very low summed variances for clusters 2 to 15. Therefore, we rely on a second method, the Silhouette plot.

5.2 Silhouette Method

5.2.1 Concepts

The silhouette score for each data point is defined as:

\[ s(i) = \frac{b(i) - a(i)}{\max\{a(i), b(i)\}}\], where:

  • \(a(i)\) = average distance from the point to all other points in the same cluster (intra-cluster distance)

  • \(b(i)\) = average distance from the point to all points in the nearest other cluster (inter-cluster distance)

The silhouette score s(i)s(i)s(i) ranges from:

  • +1: the point is well-clustered

  • 0: the point lies on the boundary between two clusters

  • –1: the point might be in the wrong cluster

You compute the average silhouette score over all data points for each value of k, and choose the k that maximizes this average.

5.2.2 Applying it to the Data

Code
#Obtain Silhouette Plots to determine optimal k 

factoextra::fviz_nbclust(na.omit(scaled.crm), kmeans, method = "silhouette", k.max = 15)

5.2.3 Interpretation

The silhouette coefficient measures how close an object is to its own cluster centroid, compared to the one of other clusters. The coefficient ranges from −1 to +1. High values indicate strong separation. Low values indicate poor separation. We thus want to select the cluster solution with the highest silhouette coefficient. In our case, the plot suggests 8 clusters. Looking again at the Elbow plot on the left, 8 seems rather high, especially as the “Elbow” – lays somewhere between 5 and 7. The silhouette plot suggests that the 7-cluster solution is inferior to the 6- and 8-cluster solutions. We may thus enrich our insights by plotting all three solutions with the following command.

5.3 Gap Statistics

5.3.1 Concept

  • Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. Journal of the Royal Statistical Society B, 63, 411–423.

The Gap Statistic compares the total within-cluster variation for different values of k with their expected values under a null reference distribution (i.e., data with no clustering structure).

It was introduced by (Tibshirani, Walther, and Hastie 2001) and works as follows:

\[ \text{Gap}(k) = \mathbb{E}[\log(W_k^*)] - \log(W_k))\]

Where:

  • \(W_k\): total within-cluster dispersion for your real data

  • \(W_k^*\): total within-cluster dispersion for reference data (generated from a uniform distribution)

  • \(\mathbb{E}[\log(W_k^*)]\): average log within-cluster dispersion for B reference datasets

  • The optimal number of clusters is the smallest k such that:

\[Gap(k)≥Gap(k+1)−s_{k+1}\]

Where \(s_{k+1}\) is the standard error of the gap statistic at \(k+1\).

5.3.2 Applied to the Data

Code
```{r}
#| eval: false
library(cluster)
set.seed(123)
gap_stat <- clusGap(na.omit(scaled.crm), 
                    FUNcluster = kmeans, 
                    K.max = 9, B = 60)
print(gap_stat)
fviz_gap_stat(gap_stat)
```
Results
> print(gap_stat)
Clustering Gap statistic ["clusGap"] from call:
clusGap(x = na.omit(scaled.crm), FUNcluster = kmeans, K.max = 9, B = 60)
B=60 simulated reference sets, k = 1..9; spaceH0="scaledPCA"
 --> Number of clusters (method 'firstSEmax', SE.factor=1): 8
          logW   E.logW       gap      SE.sim
 [1,] 9.849855 10.24394 0.3940806 0.001610821
 [2,] 9.649660 10.14771 0.4980528 0.001269681
 [3,] 9.589105 10.12340 0.5342959 0.002450073
 [4,] 9.355471 10.10131 0.7458421 0.002305038
 [5,] 9.072338 10.08633 1.0139880 0.001651291
 [6,] 9.003449 10.07295 1.0695044 0.001460224
 [7,] 8.949475 10.06186 1.1123887 0.001367756
 [8,] 8.903180 10.05149 1.1483141 0.001347131
 [9,] 8.979300 10.04303 1.0637305 0.001284367

fviz_gap_stat(gap_stat)

Gap Statistics with B = 60
  • k = 8 is the optimal number of clusters to have, as the Gap (8) > Gap (9) - SE (9) = 1.148 > 1064 - 0.001
Question 3

Do the Gap Statistics analysis with 30 bootstrapped samples. Report the results and interpret the results. Do you see a significant difference between it and the one above with B=60?

5.4 Comparison of Three Methods

Method Measures Pros Cons
Elbow WCSS (within-cluster sum of squares) Simple and fast Subjective; elbow is often unclear
Silhouette Separation and cohesion More interpretable; good for cluster quality insight Computationally heavier
Gap Statistics WCSS vs. null distribution Statistically grounded; includes standard error Computationally expensive (especially with high B)

5.4.1 Summary

  • The Gap Statistic is especially useful when you want a formal test-like approach to select k, as it incorporates a reference distribution and standard error.

  • It is more robust and objective than the elbow method.

  • It can outperform the silhouette method when the cluster structure is subtle or noisy.

  • But it is also slower, especially for large datasets.

  • If you’re doing a rigorous analysis and can afford the computing time, Gap Statistic is arguably the best overall; if you want speed and interpretability, Silhouette is an excellent alternative. Use all three methods if possible to triangulate the best k.

6 Step 3: K-means solutions for optimal K candidates

6.1 K=6, 7, 8

Code
#Plot Cluster Solutions
#k6
set.seed(321)
StrattonCluster_6k <- kmeans(scaled.crm, 6)
fviz_cluster(StrattonCluster_6k, scaled.crm, ellipse.type = "norm")

Code
#k7
set.seed(321)
StrattonCluster_7k <- kmeans(scaled.crm, 7)
fviz_cluster(StrattonCluster_7k, scaled.crm, ellipse.type = "norm")

Code
#k8
set.seed(321)
StrattonCluster_8k <- kmeans(scaled.crm, 8)
fviz_cluster(StrattonCluster_8k, scaled.crm, ellipse.type = "norm")

7 Step 4: Interpretation of 8 Cluster Output

Let us first start by looking in more detail at our 8-cluster k-means model and see how big each cluster is, with the following code.

7.1 K=8 Cluster Size

Code
# 8 cluster k-means cluster size plot

sizes8k <- data.frame(Size = StrattonCluster_8k[["size"]], 
                      Cluster = c("Cluster1", "Cluster2", "Cluster3", "Cluster4",
                                  "Cluster5", "Cluster6", "Cluster7", "Cluster8"))
sizes8k |> 
ggplot(aes(factor(Cluster), Size)) + 
  geom_col(fill=hcl(195, 100, 65)) + 
  xlab("Cluster") + 
  ylab("Size") + 
  geom_text(aes(label=Size), vjust=0) + 
  ggtitle("Cluster sizes k-means 8-cluster solution")

7.2 Means Table by 8 Clusters

To gain deeper insights into spending behavior and the digital affinity of different segments, we aim to plot the means of the various variables. To achieve this, we first assemble a descriptive data set with all variable means per cluster with the help of dplyr’s group_by function.

Code
# Build Mean per Cluster DataFrame
BankinCRMData$k8Cluster = StrattonCluster_8k$cluster

BankinCRMData.means.percluster_8k = BankinCRMData |> 
  group_by(k8Cluster) |>  
  summarise_if(is.numeric, mean, na.rm = TRUE)

BankinCRMData.means.percluster_8k
Code
glimpse(BankinCRMData.means.percluster_8k)
Rows: 8
Columns: 30
$ k8Cluster             <int> 1, 2, 3, 4, 5, 6, 7, 8
$ Age                   <dbl> 57.96533, 25.68300, 23.09436, 37.35760, 41.87000…
$ Income                <dbl> 180029.46, 90409.48, 37983.58, 74948.43, 67991.1…
$ HouseholdSize         <dbl> 3.488000, 2.194000, 3.635214, 2.113600, 3.441000…
$ CityAreaSize          <dbl> 120068.7, 132506.0, 690126.4, 453890.4, 450057.9…
$ MeanCityIncome        <dbl> 139990.5, 115687.9, 250170.2, 235000.0, 109774.5…
$ MeanCityHousePrize    <dbl> 620012.3, 369396.5, 1849960.6, 450105.5, 131557.…
$ MeanCityHouseHoldSize <dbl> 3.366667, 2.533333, 2.276265, 2.018400, 4.144000…
$ MeanCitySqFtPrice     <dbl> 6497.381, 3458.722, 8594.622, 6193.529, 2290.453…
$ NumberCars            <dbl> 2.7700000, 1.3866667, 0.8939689, 0.3616000, 1.35…
$ InternetTrafficVolume <dbl> 15.01533, 50.01333, 84.94163, 115.01680, 114.993…
$ MortageVolume         <dbl> 149966.17, 318868.28, 14989.43, 250262.26, 12479…
$ AccountSpending       <dbl> 3501.7735, 1097.0318, 551.9743, 796.5764, 1296.4…
$ CreditCardSpending    <dbl> 1900.2100, 1155.2976, 649.3741, 549.8931, 750.66…
$ HelpHotlineTime       <dbl> 34.911152, 15.795097, 4.358455, 10.275851, 8.287…
$ CustomerSince         <dbl> 65.082000, 10.751333, 2.958171, 2.516000, 12.486…
$ GrocerySpending       <dbl> 1200.1166, 474.9574, 264.9690, 421.9176, 650.101…
$ StockVolume           <dbl> 2393.6115, 1850.4302, 649.3551, 5499.0369, 3501.…
$ CreditVolume          <dbl> 149.3357, 249.7797, 2499.1785, 3500.3457, 1500.7…
$ NASDAQInvest          <dbl> 3003.1411, 925.6698, 400.1223, 4499.8338, 3499.9…
$ USAXSFundInvest       <dbl> 100.0340, 900.1365, 150.3297, 1049.4645, 2897.03…
$ BranchVisits          <dbl> 9.985333, 4.506333, 2.291829, 1.610400, 1.146000…
$ AppLogins             <dbl> 10.02533, 54.98833, 65.09241, 114.90560, 90.1300…
$ ATMVisits             <dbl> 4.036000, 2.775333, 6.167315, 3.189600, 7.103000…
$ TimeOnlineBanking     <dbl> 30.03605, 137.57997, 84.98368, 224.96318, 175.13…
$ ServiceFees           <dbl> 44.87447, 15.09119, 24.97558, 109.95768, 75.2831…
$ SocialMediaInter      <dbl> 2.502667, 5.498333, 16.380350, 55.420800, 35.333…
$ Bitcoins              <dbl> 0.0001021333, 0.2002524667, 0.1000003891, 0.4999…
$ NFTs                  <dbl> 1.000000, 2.007000, 3.023346, 9.004800, 6.472000…
$ k4Cluster             <dbl> 3.000000, 3.501333, 4.000000, 2.000000, 1.000000…

7.3 ServiceFees by 8 clusters

We can now generate bar plots of the different variables of interest and see if we find promising segments of Stratton & Files customers who might be open to and suitable for Stratton AE Banking. Let us first focus on spending behavior, as indicated by the service fee variable. Note that we adapted some of the commands in ggplot. By leaving geom_col() blank we do not specify a color and the plot remains in grey. In addition, we ask ggplot in geom_text to add labels with the two-digit rounded values of ServiceFees in white color and in font size 2. With the position_stack command we put the values in the middle of the barplot.

Code
#Barplot of Service Fees 

BankinCRMData.means.percluster_8k |> 
ggplot(aes(factor(k8Cluster), ServiceFees)) + 
  geom_col() + 
  geom_text(aes(label = round(ServiceFees, digits = 2)),
               size = 4, colour = "white", 
               position = position_stack(vjust = 0.5)) +
  labs(x = "Clusters",
       y = "Extra Fees paid for banking services",
       title = "Average Spending in Service Fees per Cluster")

Insights
  • A visual inspection indicates that clusters 4 and 5 show the highest spending behavior, with clusters 1 and 8 following, while the remaining clusters show rather low service fee spending. This makes at least the four high-spending segments attractive for AE Banking.

  • However, to be sure that the rather novel and highly digital app service appeals to these segments, we need to understand how digitally active and interested these segments are.

8 Step 5: Profiling Segment Personas

8.1 Fintec by 8 clusters

Let us first focus on the latest developments in fintech such as Bitcoin and NFT investments. We can again compare the segment-specific means for both variables. This time we want to combine the plots of Bitcoins and NFTs in one plot. We can arrange this with ggplot’s facet_wrap function that allows us to combine plots of different variables. The only “complication” we need to address is that we need to rearrange the dataset we want to plot. We can again use dplyr for this.

We first select the variables of interest (cluster, NFTs and Bitcoins) and then transpose the data frame from a wide to a long format. We can then use ggplot again. This time we use the geom_bar command instead of the geom_col command. Facet_wrap will now tell ggplot to make two plots and combine them under each other (col =1). By setting scales to “free_y” we allow different y-axis levels, given that scales substantially vary across the two different variables.

Code
#Barplots of Fintech Investments

FinTech <- BankinCRMData.means.percluster_8k |>  
  select(k8Cluster, NFTs, Bitcoins) |> 
  pivot_longer(cols = -k8Cluster, 
               names_to = "variable", 
               values_to = "value")

ggplot(FinTech, aes(factor(k8Cluster), value))+
  geom_bar(stat='identity') + 
  xlab("Clusters") +
  facet_wrap(~variable,  ncol=1, scales = "free_y") +  
  geom_text(aes(label = round(value, digits = 1)), 
            size = 4, 
            colour = "white", 
            position = position_stack(vjust = 0.5)) +
  ggtitle("FinTech Cluster Means")

Insights:
  • From the inspection, we can see that clusters 4 and 5 show both most activity in NFTs acquisitions and are also most invested in Bitcoins, which makes them even more suitable for AE Banking.

Let us now look at digital activities and compare digital and offline activities.

8.2 Digital vs. Offline activities by 8 Clusters

With the following code, we can inspect the means for

  • BranchVisits,
  • AppLogins,
  • ATMVisits,
  • TimeOnlineBanking,
  • SocialMediaInter,
  • InternetTrafficVolume.

As you can see from facet_wrap, we now include two columns.

Code
#Plots for Digital vs. Offline Life

DigLife = BankinCRMData.means.percluster_8k %>% 
  select(k8Cluster, BranchVisits, AppLogins, 
   ATMVisits, TimeOnlineBanking, SocialMediaInter, 
    InternetTrafficVolume) %>%
  pivot_longer(cols = -k8Cluster, 
               names_to = "variable", 
               values_to = "value")

ggplot(DigLife, aes(factor(k8Cluster), value))+
  geom_bar(stat='identity') + xlab("Clusters") +
  facet_wrap(~variable,  ncol=2, scales = "free_y") +  
  geom_text(aes(label = round(value, digits = 1)), size = 4, colour = "white", 
            position = position_stack(vjust = 0.5)) +
  ggtitle("Digital Life vs. Offline Life Cluster Means")

Insights:
  • The plot further confirms the strong digital affinity of clusters 4 and 5. Both show the lowest number of branch and ATM visits, while showing strong activity in online baking, internet traffic, social media interest, and banking app logins.
  • While we can now be sure that customers from segments 4 and 5 are highly digital affine and are thus likely to be interested in AE Banking, we should, in the next step, control the financial situation of these customers.

Let us first focus on average age, income, and household sizes.

8.3 Social Economics by 8 clusters

Code
#Plots for Socio Economic Factors 

SocioEcon <- BankinCRMData.means.percluster_8k %>% 
  select(k8Cluster, Age, Income, HouseholdSize) %>%
  gather(key = "variable", value = "value", -k8Cluster)

ggplot(SocioEcon, aes(factor(k8Cluster), value))+
  geom_bar(stat='identity') + xlab("Clusters") +
  facet_wrap(~variable,  ncol=1, scales = "free_y") +  
  geom_text(aes(label = round(value, digits = 1)), 
            size = 4, colour = "white", 
            position = position_stack(vjust = 0.5)) +
  ggtitle("Socio-Economic Cluster Means")

Insights
  • The plots reveal the problems with socio-economic clustering, as the results for age and household size do not vary too much across the 8 clusters.

  • We see some variation for income, where clusters 4 and 5 remain close to the total mean of the dataset, indicating that the digital-affine users identified are neither poor nor rich, making them still a suitable target group.

  • Age-wise, we similarly see that both segments are well-established adults in their end 30s or early 40s.

  • Given that the socio-economic information indicates that the digital-affine users profit from stable incomes, we should in the next steps focus on spending and investment behavior to understand whether these segments allow sufficient business volume and growth potential.

8.4 Spending and Investments by 8 clusters

Code
#Plots for Spending and Investments

Invest <- BankinCRMData.means.percluster_8k |>  
  select(k8Cluster, MortageVolume, StockVolume, 
         NASDAQInvest, USAXSFundInvest)|> 
  pivot_longer(-k8Cluster, names_to = "variable", values_to = "value")

ggplot(Invest, aes(factor(k8Cluster), value))+
  geom_bar(stat='identity') + xlab("Clusters") +
  facet_wrap(~variable,  ncol=2, scales = "free_y") +  
  geom_text(aes(label = round(value, digits = 1)), 
            size = 2, colour = "white", 
            position = position_stack(vjust = 0.5)) +
  ggtitle("Investment Cluster Means")

Code
Spending <- BankinCRMData.means.percluster_8k |>  
  select(k8Cluster, AccountSpending, 
         CreditCardSpending, GrocerySpending) |> 
  gather(key = "variable", value = "value", -k8Cluster)

ggplot(Spending, aes(factor(k8Cluster), value))+
  geom_bar(stat='identity') + xlab("Clusters") +
  facet_wrap(~variable,  ncol=1, scales = "free_y") +  
  geom_text(aes(label = round(value, digits = 1)), 
            size = 3, colour = "white", 
            position = position_stack(vjust = 0.5)) +
  ggtitle("Spending Cluster Means")

Insights
  • Investment:

    • From the inspection of the two plots, it becomes evident that clusters 4 and 5 are more invested in stocks than their counterparts, and compared to the other clusters also share lower levels of mortgages.

    • Looking at the types of investments, we see that the cluster 4 is more invested in NASDAQ-listed companies than all other clusters, while cluster 6 is strongly invested in Stratton’s fund for small and mid-size US companies.

  • Spending:

    • Spending behavior information indicates that both segments comprise fewer spending customers with cluster 4 showing the lowest credit card turnover of all clusters.

    • In the case of grocery expenditures, we see cluster 5 being the cluster with the second-highest average spending behaivor.

  • Last, we can enrich our insights, by looking at the living conditions of the different segments and see where the different segments are located. To achieve this, we finally compare residential information.

8.5 Living Conditions by 8 clusters

Code
#Plots Residential Information

Life <- BankinCRMData.means.percluster_8k |>  
  select(k8Cluster, CityAreaSize, MeanCitySqFtPrice, 
         MeanCityHouseHoldSize, MeanCityIncome) |> 
  gather(key = "variable", value = "value", -k8Cluster)

ggplot(Life, aes(factor(k8Cluster), value))+
  geom_bar(stat='identity') + xlab("Clusters") +
  facet_wrap(~variable,  ncol=2, scales = "free_y") +  
  geom_text(aes(label = round(value, digits = 1)), 
            size = 2, colour = "white", 
            position = position_stack(vjust = 0.5)) +
  ggtitle("Life Conditions Cluster Means")

Insights:
  • From the plot we learn that clusters 4 and 5 both prefer city areas with mid-to-high levels of population.

  • In case of cluster 4, the average household sizes in the residential areas are rather small, while in case of cluster 5 we observe larger compounds with an average 4 members living in one household.

  • Looking at income distributions and the area’s soil values, we learn that cluster 4 lives in rather richer neighborhoods with higher soil prices, whereas cluster 5 members prefer middle-class neighborhoods with affordable, low soil prices.

Question 4
  • Combining the information at hand, how do you depict members of clusters 4 and 5 and how do you believe they differ from each other?
Task 2

Develop personas for the other clusters as well. Summarize all eight segments and their personas in a table. Use a summary table, where clusters are listed in the columns and various variables are listed in the rows (e.g., digital activities, offline activities, socio-economic variables, investment, spending, living conditions, etc.). Lastly, label each cluster (cluster name) to best describe the persona.

9 Taking Actions from Insights

The results of the cluster analysis allow Stratton AE Banking to take several important marketing actions. First, the profound understanding of the different available market segments allows the joint venture to understand the various types of customers available and to determine which segments in the existing customer base should be built upon as the base for future marketing activities.

To develop suitable positioning strategies for each cluster and subsequently develop communication campaigns, one can utilize the additional insights gained from the cluster analysis and the comparison of the cluster-specific means of the remaining variables.

Furthermore, the results of the cluster analysis can be used to also predict the interests and preferences of newly incoming customers. Here, one may use the existing information available and calculate the Euclidean distances between the new customer and the centers (i.e. the means of each dimension) of each cluster. The customer will likely belong to the cluster, with the lowest distance.

References

Tibshirani, Robert, Guenther Walther, and Trevor Hastie. 2001. “Estimating the Number of Clusters in a Data Set Via the Gap Statistic.” Journal of the Royal Statistical Society Series B: Statistical Methodology 63 (2): 411–23. https://doi.org/10.1111/1467-9868.00293.